Sigma method for the microcanonical entropy or density of states 
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We introduce a simple improvement on the method to calculate equilibrium entropy differences 
between classical energy levels proposed by Davis (S. Davis, Phys. Rev. E, 050101, 2011). We 
demonstrate that the modification is superior to the original whenever the energy levels are suffi- 
ciently closely spaced or whenever the microcanonical averaging needed in the method is carried 
out by importance sampling Monte Carlo. We also point out the necessary adjustments if Davis's 
method (improved or not) is to be used with molecular dynamics simulations. 



Consider a system with configurational coordinates 
and potential energy function [/({r^}). The Hamil- 
tonian of the system is of the standard classical form, that 
is, separable in its coordinates and conjugate momenta, 
{Pi}, so that it may be written thus 

H({r i },{p i }) = U({r i })+K({p i }) (1) 

where K({pi}) is the kinetic energy of the system. In 
this Brief Report, we consider the calculation of the 
energy dependence of the microcanonical Boltzmann- 
Planck equilibrium entropy, 

S(E) = fclnw(-E') (2) 

where k is Boltzmann's constant and 

cv(E) = C J {dnHdpiWE - H({Ti}, {Pi})) (3) 

is the phase density, also known as the density of states, 
where C is a constant that assures ui(E) is dimensionless 
and 6 is Dirac's 5 function. Algorithms to evaluate ui(E) 
(and thus S(E)) abound in the literature [HHT]. They 
each have their advantages and drawbacks, and an ex- 
haustive review of them all is not possible in this Brief 
Report. Here, we instead focus in particular on the re- 
cently proposed a method by Davis [18 . We will reca- 
pitulate its derivation and offer an improvement on the 
original method. Our notation differs slightly from that 
of Davis. 

For the moment, we consider a microcanonical ensem- 
ble whose only first integral of motion is the total energy, 
E. We will briefly consider the case with more first inte- 
grals of motion later. The Laplace principle of indiffer- 
ence assigns equal a priori probability to all phase space 
points on the energy shell H({ri}, {pi}) = E. In other 
words, the ensemble probability density is constant on 
this energy shell and zero everywhere else. We write this 
probability density as 

WeUt,}, { Pi }) = -£=r6(E - Hdr,}, { Pl })) (4) 
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If K({pi}) is quadratic in each conjugate momentum co- 
ordinate and shows no complicated interdependencies (in 
this equation {mi} are generalized masses), 

i 

the dependence on {pi} can be integrated out, yielding 

[IBM], 

^({r,}) =C' E (E— [/({rj)f /2 - x e(E-U({r t })) (6) 
where 

C' E = (| {dnUE - C/({ ri }))™/2-i e (E - f/({r,}))) 

. (7) 

is a normalization constant that is inversely proportional 
to u)(E), is the Heaviside step function, and n is the 
number of configurational degrees of freedom of the sys- 
tem (which for an unconstrained particle system is three 
times the number of particles in three dimensions). The 
quantity W^-fr,}) is directly proportional to the density 
of kinetic energy states. The microcanonical average of a 
quantity A({ri}) that does not depend explicitly on the 
momenta can now be expressed as, 

(A({r i })) E = |{drJP? £ ({rJP4({rJ). (8) 

The probability function in eq. ^ can be used as the 
weighting factor in a microcanonical Markov chain Monte 
Carlo simulation [TM2T] to calculate averages according 
to eq. |[8]). In a molecular dynamics simulation, how- 
ever, in which additional integrals of motion appear, the 
probability function of eq. ^ is not the proper one, as- 
sumed ergodicity notwithstanding. In this case, the cor- 
rect probability function is given by [22], 

WeUt,}) =C e [E- U({r t }) —J (9) 

where P is the center-of-mass momentum, M the to- 
tal mass, Ce a normalization constant and n carries the 
same meaning as in eq. ^ but does not correspond to the 
same numerical value, there being one degree of freedom 



2 



less for each Cartesian component of the center-of-mass 
momentum. 

At this point, Davis |18| introduces the quantity 



additional first integrals, in which case one may intro- 
duce the function, 



_ e(E r - Ufa})) 

VE.EAM)- {E _ U{{T . }))n /2-l 



(10) 



(E> - [/({rj))"'/ 2 - 1 
(E"-U({v. l })~^ i y^~ 

xeiE'-uttn})) 



(16) 



with the condition that E x < E, but E T otherwise ar- an(1 calculate entropies by eq. (fl3|. Here n' exceeds 



bitrary. When the microcanonical average of eq. ( 10 ) is 
calculated using eq. ^ and eq. ^ , keeping in mind that 
C' E oc 1/uj(E), it is seen that that the entropy difference 
according to eq. ^ between two energy levels E' and E" 
is given by, 



E (cT E „, Et {{Ti}))E» 



(11) 



which is the rational for introducing the a function. Sim- 
ilar to Davis's procedure, let us introduce the quantity 

*E»M{n}) = 7^SS£e(£' - Ufa})) 



{E» - Ufa t }))^ 
with E' < E". We may then write 

A|;'5 = -fcln(S £ » E ,({r i })> B „ 



(12) 



(13) 



The proof of this equation follows directly from the sub- 
stitution of eq. (12) into eq. (13), whence eq. ^ can be 
identified and, after using the inverse proportionality be- 
tween C' E and U)(E), this leads to eq. ^ in difference 
form. Clearly, both eqs (11) and ( fi"3| ) may be used to 
calculate the entropy difference. Shortly, we will con- 
sider the question of which function is the most efficient 
from a computational perspective. 

The above equations are to be used when the total 
energy is the only integral of motion in the mechanical 
system. For completeness, we note the form that the 
corresponding sigma functions must take when the aver- 
aging is done by molecular dynamics means, if the objec- 
tive is to obtain the density of states. In this case, the a 
function becomes 



<7E,E T fa}) = 



e(£ r - Ufa}) 



2M ' 



(E- Ufa}) 
and the £ function is to be replaced by, 



(14) 



^E",E'fa}) = 



{E'-ufa})-^ri^ 

(E»-Ufa})-^/*-i 



Q(E'-Ufa}) 



p2N 

2MJ 



(15) 



Once again, heed must be paid to the value of n so that 
the subtraction of the center-of-mass momentum degrees 
of freedom is accounted for. In every other respect, the 
equations for the entropy differences remain formally un- 
changed. Quite conceivably, one might want to extract 
the corresponding entropy of the system without these 



by the number of Cartesian components in the center-of- 
mass momentum. The corresponding form for eq. (11) 
follows by analogy. 

We now turn to an analysis of the relative computa- 
tional merits of eqs (fTTl) and (13). To obtain S(E) as 



a (quasi-)continuous function of E, the calculation may 
be subdivided into N discrete energy segments over a 
predefined energy range. For instance, if the energy 
interval [E',E"], is subdivided into N segments sepa- 
rated at energies {Ei} such that E = E',Ei < E i+ i; 
i — 0, 1, . . . , N — 1; and E N — E", then the total entropy 
difference is given as a sum over the individual entropy 
differences for each segment, 



a* :;s = 



N-l 



(17) 



With eq. (11), N + 1, and with eq. (13), N averages 
are needed. This difference becomes negligible for large 
N, which often corresponds to the most interesting situa- 
tions. In the limit N — > oo, keeping E' and E" fixed, this 
gives S(E) as a continuous function of E in the interval 
[£", E"\, We now note that as N -> oo, E i+1 - Ei 
and in this limit, Y,E i+1 .Ei({^j}) —> 1 for all {r 3 } ac- 
cessible in the microcanonical ensemble (that is, for all 
{rj} such that U({rj}) < E i+ i) and because the E func- 
tion being averaged becomes identically unity, the aver- 
age (EE i+1 ,Ei({ r j}))Ei + i loses all statistical uncertainty. 
Before we continue, we note that this quality is not as- 
sured for the averages over the corresponding a functions, 
as they do not enjoy the same guarantee, and even less 
so their ratio. 

In order to complete and strengthen the general argu- 
ment, we should sum up and consider the uncertainties 
of all the N individual averages. Therefore, considering 
the rate by which the averages of the S functions ap- 
proach unity (and lose their statistical uncertainty), is of 
importance. Rearranging eq. (17), it is clear that 



ln(E, 



,Ei Ea. 



kN 



(18) 



In other words, the logarithm of each individual S aver- 
age tends to zero inversely proportionally to N. In the 
numerical implementation, errors will accrue if the ratio 
on the right-hand side becomes of the order of the numer- 
ical precision. To keep the notation as simple as possible, 
we temporarily restrict our attention to n = 2, in which 
case the formulae are drastically simplified. In this case, 
for instance, the uncertainty of the averages may be esti- 
mated from (Q(Ei — U({rj})))E i+1 - For TV large enough, 
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the converged value of (Q(Ei — U({rj})))E i+1 will be very 
close to, but slightly less than, unity. With finite statis- 
tics, we estimate this value to be aj. Because of our 
choice of n = 2, the average in question is composed only 
of terms being either unity or zero. If there are Mi terms 
equal to unity and m, terms equal to zero sampled in the 
numerical averaging, then a, = Mi/ (Mi + mi). In the 
"worst case scenario", the statistics of the ensemble av- 
eraging is so poor (because of Mi + im being chosen too 
small) that a* is virtually a non-uniform random number 
between zero and one. This value is thus different from 
the actual converged value, which we denote fa. The 
average magnitude of this error is a measure of the un- 
certainty in the averaging. The variance of the relative 
error, for instance, can be formulated as 1231 . 



Var 



- 1 = 



Inserting the expression for We 1+1 , 
tegral on the right-hand side is, 



we find that the in- 



J d{r,}^ l+1 ({r,})e(^ - U({r,})) = 

fd{r' j }Q(E i +A If E-U({T' j })) 
where we have introduced A^E = (E" — E')/N. 



(20) 



We cannot hope to solve the integral in eq. (201 in 



the general case, and like this obtain the variance as an 
explicit function of N . There are, however, some con- 
clusions to be drawn from the general form of the right- 
hand side. In molecular systems, the accessible config- 
uration space generally increases superlinearly with in- 
creasing potential energy. Hence, the integral in the de- 
nominator of eq. ( 20 ) should increase superlinearly with 
increasing A^E. It follows immediately, that the vari- 



ance according to eq. ( 19 1 should decrease superlinearly 



with decreasing AjyE cx N 1 or, in other words 



Jd{r;}e(£ 1 + A«i?-[/({r;})) 



l + 0(N- a ) (21) 



where a > 1 is undetermined (but assuredly greater than 
unity). Thus, the total variance (given as N times the in- 
dividual variance) will decrease to zero as 0(N 1 ~ a ) when 
N — > oo. It follows that for a sufficiently finely meshed 
energy grid, the £ function will always be computation- 
ally more efficient than the a function, regardless of the 
complexity of the system, as long as its accessible config- 
uration space increases superlinearly with increasing po- 
tential energy. The general argument, but with clumsier 
notation, can be carried through also with n^2. 

In the numerical implementation, the limit N — > oo 
may of course not be reached exactly and so a superior 



computational efficacy in the numerically very demand- 
ing N — > oo limit is not necessarily relevant in actual cal- 
culations. We must therefore also consider the relative 



efficacy of eqs (111 and (13 1 for finite energy differences. 



As discussed by Davis 18J, in the case of eq. (11), the 
constant E T must be chosen so that both averages under 
the logarithm are calculated with enough statistics. Too 
small values of E r restrict the statistics sampled, as not 
enough sampled configurations do then have potential 
energies [/({r^}) < E T . At the same time E r must be 
less than or equal to the smallest of the two energies for 
which the entropy difference is calculated, meaning that 
a too large energy gap will be detrimental to the statistics 
of the higher energy average. This essentially introduces 
an upper bound for the energy difference for which the 
entropy difference can be reliably calculated. As noted 
by Davis, this upper bound will depend on the size of 
the system, because the fluctuations in potential energy 
become smaller, the larger the system is (in the sense of 
the value of n). We note that a similar restriction (for 
the same reasons) applies to eq. (131, in which E' takes 
the place of E T , 

To consider the question of convergence of the averages 
in more detail we restrict our attention somewhat and 
assume that the microcanonical statistics are sampled by 
importance sampling Metropolis Monte Carlo according 
to the probability function WE({^i}) [21]. In this case, 
a statistically good estimate of the average of a function 
yl({rj}) is obtained if ^4({ri}) contributes appreciably in 
regions where We^i}) is large, and likewise contributes 
negligibly in regions where Wsd^i}) is close to zero. The 
question thus reduces to which of the two sigma functions 
is most "similar" to Ws({ri}), in the sense that they 
share the domains where they are both of appreciable 
magnitude. For instance, consider the ratios between the 
sigma functions and the Markov weighting function, 



Ss",fi<({ri» 
W E »({ri}) 



PE",E t ({*i}) 



(E> - U({v l })) n/2 - 1 

C E „ (E"-u({n})r- 

Q(E'-U({v l })) 

e(E"-u({Ti})y 
e(E r - uan})) 



w E „({vi}) o E „(E» -u({ri})y 



o-E',E T ({ r i}) 6(£ r - U({ri})) 



WE'dVi}) C' B ,(E>-U({vi})Y- 



(22) 



(23) 



(24) 



The less similar the two functions are, the less constant 
is their ratio. In the simplest case, the limiting case of 
an ideal gas, the {r^} gradients vanish for all of these 
ratios and the relative qualities of the importance sam- 
pling of the averages are not distinguishable between the 
<7 and £ functions. When interactions are present, this 
is no longer the case. Whereas the resulting ratios of 
both eqs. (23) and (24) consist of a practically constant 
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FIG. 1. Plot of the difference between either of the two sigma 
functions for specific and arbitrarily chosen E" — 2,E' = 
E T = 1 and U (r) = r 2 and the importance sampling function 
(with n = 3) used in microcanonical Monte Carlo simulations. 
In this plot, C' E ,i is arbitrarily set to C' E ,, = 1. 



numerator and a monotonously and smoothly decreasing 
denominator as a function of U({rj}), the ratio between 

the T±e",e> function and We>< presents a smoothly de- 
creasing function of U for both numerator and denomi- 
nator. Hence, it would seem that this ratio is more invari- 
ant with respect to changes in {r^} (and hence U({rj})) 
than the others. This becomes, once again, particularly 
pronounced when E' — > E" . A more-or-less constant 
difference can be used just as well as an indication of 
similarity and this is what we consider in Fig. [T] in the 
case of a three-dimensional harmonic oscillator for which 
U(r) = r 2 . This is a model potential for atomic crys- 



tals and makes for a reasonably relevant comparison. As 
anticipated, the difference Yie».e' ~ We" exhibits much 
less variation than <7e",e' — We»- Above E = E' = E T , 
they become identical. 

We do not offer any numerical experiments to illustrate 
the method. This has already been achieved by Davis [TH] 
on a non-trivial system using the a function. However, 
the numerical upper limitation on the size of tractable 
systems that Davis points out is nonetheless important 
to recall. A similar limitation, although much less se- 
vere, is present already in the microcanonical sampling 
algorithm [TM2T] . as acceptance probabilities for a trial 
move taking the system from potential energy U' to U" 
at total energy E are proportional to the ratio 



E-U" 



n/2-l 

E-WJ ^ E - U,,) 
which for large n values ought to become difficult for the 
computer architecture to resolve to sufficient accuracy, 
as the acceptance ratio takes on a more "step function" - 
like form. Severin et al. [19] initially introduced the 
sampling algorithm for sampling the internal degrees of 
freedom of single molecules. Obtaining the density of 
states of complicated polyatomics, needed for instance in 
statistical reaction rate theories, is thus a natural appli- 
cation of a method such as this. Nevertheless, Ray [2T] 
reports comfortable simulations on up to 500 particles, 
using this microcanonical sampling. Such system sizes 
should be sufficient for many purposes in statistical me- 
chanics. 

In conclusion, we note one interesting formal property 
of the £ averages: from a single microcanonical molecular 
dynamics (or Monte Carlo) run, in principle the entire 
S(E) function is obtainable (up to an additive constant). 
This follows since the energy E' is arbitrary in eq. (13 1, 



yet does not affect the dynamics. Nevertheless, it is clear 
from the limitations discussed above, that good statis- 
tics would only be achieved in a narrow range below E" . 
However, for very small systems, this range might be 
quite broad. 
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